Using codebooks of fragmented connected-component contours in forensic and historic writer identification

نویسندگان

  • Lambert Schomaker
  • Katrin Franke
  • Marius Bulacu
چکیده

Recent advances in ’off-line’ writer identification allow for new applications in handwritten text retrieval from archives of scanned historical documents. This paper describes new algorithms for forensic or historical writer identification, using the contours of fragmented connected-components in free-style handwriting. The writer is considered to be characterized by a stochastic pattern generator, producing a family of character fragments (fraglets). Using a codebook of such fraglets from an independent training set, the probability distribution of fraglet contours was computed for an independent test set. Results revealed a high sensitivity of the fraglet histogram in identifying individual writers on the basis of a paragraph of text. Large-scale experiments on the optimal size of Kohonen maps of fraglet contours were performed, showing usable classification rates within a noncritical range of Kohonen map dimensions. The proposed automatic approach bridges the gap between image-statistics approaches and purely knowledge-based manual characterbased methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond OCR: Handwritten Manuscript Attribute Understanding

Historical manuscript dating has always been an important challenge for historians but since countless manuscripts have become digitally available recently, the pattern recognition community has started addressing the dating problem as well. In this chapter, we present a family of local contour fragments (kCF) and stroke fragments (kSF) features and study their application to historical documen...

متن کامل

University of Groningen Beyond OCR

Historical manuscript dating has always been an important challenge for historians but since countless manuscripts have become digitally available recently, the pattern recognition community has started addressing the dating problem as well. In this chapter, we present a family of local contour fragments (kCF) and stroke fragments (kSF) features and study their application to historical documen...

متن کامل

Writer Identification and Verification: A Review

Writer identification and verification has been studied over the past decade due to the high demand of such implementations in a wide variety of offline handwritten data applications. Among the applications are verification of the identity of a person whom he or she claimed to be and forensic and historic document analysis in identifying the identity of the writer of the documents. This paper p...

متن کامل

Identify Handwriting Individually Using Feed Forward Neural Networks

The paper justifies the necessity to use the hand writer identification using the feed forward neural networks. Identifying the authors of a handwritten sample using automatic image-based processing methods is an interesting pattern recognition problem with direct applicability in the legal and historic documents. Leading a worrisome life among the harder forms of biometrics, the identification...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2007